{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "# Huggingface Sagemaker-sdk - Deploy 🤗 Transformers for inference\n"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "Welcome to this getting started guide, we will use the new Hugging Face Inference DLCs and Amazon SageMaker Python SDK to deploy a transformer model for inference.  \n",
    "In this example we directly deploy one of the 10 000+ Hugging Face Transformers from the [Hub](https://huggingface.co/models) to Amazon SageMaker for Inference."
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## API - [SageMaker Hugging Face Inference Toolkit](https://github.com/aws/sagemaker-huggingface-inference-toolkit)\n"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "Using the `transformers pipelines`, we designed an API, which makes it easy for you to benefit from all `pipelines` features. The API is oriented at the API of the [🤗  Accelerated Inference API](https://api-inference.huggingface.co/docs/python/html/detailed_parameters.html), meaning your inputs need to be defined in the `inputs` key and if you want additional supported `pipelines` parameters you can add them in the `parameters` key. Below you can find examples for requests. \n",
    "\n",
    "**text-classification request body**\n",
    "```python\n",
    "{\n",
    "\t\"inputs\": \"Camera - You are awarded a SiPix Digital Camera! call 09061221066 fromm landline. Delivery within 28 days.\"\n",
    "}\n",
    "```\n",
    "**question-answering request body**\n",
    "```python\n",
    "{\n",
    "\t\"inputs\": {\n",
    "\t\t\"question\": \"What is used for inference?\",\n",
    "\t\t\"context\": \"My Name is Philipp and I live in Nuremberg. This model is used with sagemaker for inference.\"\n",
    "\t}\n",
    "}\n",
    "```\n",
    "**zero-shot classification request body**\n",
    "```python\n",
    "{\n",
    "\t\"inputs\": \"Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!\",\n",
    "\t\"parameters\": {\n",
    "\t\t\"candidate_labels\": [\n",
    "\t\t\t\"refund\",\n",
    "\t\t\t\"legal\",\n",
    "\t\t\t\"faq\"\n",
    "\t\t]\n",
    "\t}\n",
    "}\n",
    "```"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "!pip install \"sagemaker>=2.48.0\" --upgrade"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Deploy one of the 10 000+ Hugging Face Transformers to Amazon SageMaker for Inference\n",
    "\n",
    "_This is an experimental feature, where the model will be loaded after the endpoint is created. This could lead to errors, e.g. models > 10GB_\n",
    "\n",
    "To deploy a model directly from the Hub to SageMaker we need to define 2 environment variables when creating the `HuggingFaceModel` . We need to define:\n",
    "\n",
    "- `HF_MODEL_ID`: defines the model id, which will be automatically loaded from [huggingface.co/models](http://huggingface.co/models) when creating or SageMaker Endpoint. The 🤗 Hub provides +10 000 models all available through this environment variable.\n",
    "- `HF_TASK`: defines the task for the used 🤗 Transformers pipeline. A full list of tasks can be find [here](https://huggingface.co/transformers/main_classes/pipelines.html)."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "from sagemaker.huggingface import HuggingFaceModel\n",
    "import sagemaker \n",
    "\n",
    "role = sagemaker.get_execution_role()\n",
    "\n",
    "# Hub Model configuration. https://huggingface.co/models\n",
    "hub = {\n",
    "  'HF_MODEL_ID':'distilbert-base-uncased-distilled-squad', # model_id from hf.co/models\n",
    "  'HF_TASK':'question-answering' # NLP task you want to use for predictions\n",
    "}\n",
    "\n",
    "# create Hugging Face Model Class\n",
    "huggingface_model = HuggingFaceModel(\n",
    "   env=hub,\n",
    "   role=role, # iam role with permissions to create an Endpoint\n",
    "   transformers_version=\"4.6\", # transformers version used\n",
    "   pytorch_version=\"1.7\", # pytorch version used\n",
    "   py_version=\"py36\", # python version of the DLC\n",
    ")"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "# deploy model to SageMaker Inference\n",
    "predictor = huggingface_model.deploy(\n",
    "   initial_instance_count=1,\n",
    "   instance_type=\"ml.m5.xlarge\"\n",
    ")"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "# example request, you always need to define \"inputs\"\n",
    "data = {\n",
    "\"inputs\": {\n",
    "    \"question\": \"What is used for inference?\",\n",
    "    \"context\": \"My Name is Philipp and I live in Nuremberg. This model is used with sagemaker for inference.\"\n",
    "    }\n",
    "}\n",
    "\n",
    "# request\n",
    "predictor.predict(data)"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "source": [
    "# delete endpoint\n",
    "predictor.delete_endpoint()"
   ],
   "outputs": [],
   "metadata": {}
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "c281c456f1b8161c8906f4af2c08ed2c40c50136979eaae69688b01f70e9f4a9"
  },
  "kernelspec": {
   "display_name": "Python 3.8.5 64-bit ('hf': conda)",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": ""
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}